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Abstract 

In continuation to earlier works where the problem of joint information embedding 
and lossless compression (of the composite signal) was studied in the absence |S] and 
in the presence [§] of attacks, here we consider the additional ingredient of protecting 
the secrecy of the watermark against an unauthorized party, which has no access to a 
secret key shared by the legitimate parties. In other words, we study the problem of 
joint coding for three objectives: information embedding, compression, and encryption. 
Our main result is a coding theorem that provides a single-letter characterization of the 
best achievable tradeoffs among the following parameters: the distortion between the 
composite signal and the covertext, the distortion in reconstructing the watermark by 
the legitimate receiver, the compressibility of the composite signal (with and without 
the key), and the equivocation of the watermark, as well as its reconstructed version, 
given the composite signal. In the attack-free case, if the key is independent of the 
covertext, this coding theorem gives rise to a threefold separation principle that tells 
that asymptotically, for long block codes, no optimality is lost by first applying a rate- 
distortion code to the watermark source, then encrypting the compressed codeword, 
and finally, embedding it into the covertext using the embedding scheme of J5J. In the 
more general case, however, this separation principle is no longer valid, as the key plays 
an additional role of side information used by the embedding unit. 



Index Terms: Information hiding, watermarking, encryption, data compression, sep- 
aration principle, side information, equivocation, rate-distortion. 



1 Introduction 



It is common to say that encryption and watermarking (or information hiding) are related 
but they are substantially different in the sense that in the former, the goal is to protect 
the secrecy of the contents of information, whereas in the latter, it is the very existence of 
this information that is to be kept secret. 

In the last few years, however, we are witnessing increasing efforts around the combina- 
tion of encryption and watermarking, which is motivated by the desire to further enhance 
the security of sensitive information that is being hidden in the host signal. This is to 
guarantee that even if the watermark is somehow detected by a hostile party, its contents 
still remain secure due to the encryption. This combination of watermarking and encryp- 
tion can be seen both in recently reported research work (see, e.g., , [2] , [0] , > El > El an d 
references therein) and in actual technologies used in commercial products with a copyright 
protection framework, such as the CD and the DVD. Also, some commercial companies 
that provide Internet documents, have in their websites links to copyright warning mes- 
sages, saying that their data are protected by digitally encrypted watermarks (see, e.g., 
http : //genealogy . lv/ 1864Lancaster/copyright .htm). 

This paper is devoted to the information-theoretic aspects of joint watermarking and 
encryption together with lossless compression of the composite signal that contains the 
encrypted watermark. Specifically, we extend the framework studied in jS] and [§] of joint 
watermarking and compression, so as to include encryption using a secret key. Before we 
describe the setting of this paper concretely, we pause then to give some more detailed 
background on the work reported in [S] and [5]. 

In [S], the following problem was studied: Given a covertext source vector X n = 
(Xi, . . . , X n ), generated by a discrete memoryless source (DMS), and a message m, uni- 
formly distributed in {1, 2, ... , 2 nRe }, independently of X n , with R e designating the embed- 
ding rate, we wish to generate a composite (stegotext) vector Y n = (Yi, . . . , Y n ) that satisfies 
the following requirements: (i) Similarity to the covertext (for reasons of maintaining qual- 
ity), in the sense that a distortion constraint, Ed(X n ,Y n ) = Y2t=l Ed(Xt, Yt) < nD, holds, 
(ii) compressibility (for reasons of saving storage space and bandwidth), in the sense that 
the normalized entropy, H(Y n )/n, does not exceed some threshold R c , and (iii) reliability 
in decoding the message m from Y n , in the sense that the decoding error probability is ar- 
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bitrarily small for large n. A single-letter characterization of the best achievable tradeoffs 
among R c , R e , and D was given in [SJ, and was shown to be achievable by an extension 
of the ordinary lossy source coding theorem, giving rise to the existence of 2 nRe disjoint 
rate-distortion codebooks (one per each possible watermark message) as long as R e does 
not exceed a certain fundamental limit. In [S], this setup was extended to include a given 
memoryless attack channel, P(Z n \Y n ), where item (hi) above was redefined such that the 
decoding was based on Z n rather than on Y n , and where, in view of requirement (ii), it is 
understood that the attacker has access to the compressed version of Y n , and so, the at- 
tacker decompresses Y n before the attack and re-compresses it after. This extension from 
[8] to [9] involved an different approach, which was in the spirit of the Gel'fand-Pinsker 
coding theorem for a channel with non-causal side information (SI) at the transmitter 
The role of SI, in this case, was played by the covertext. 

In this paper, we extend the settings of [S] and |5| to include encryption. For the sake 
of clarity of the exposition, we do that in several steps. 

In the first step, we extend the attack-free setting of 8 : In addition to including 
encryption, we also extend the model of the watermark message source to be an arbitrary 
DMS, U\,U2, ■ ■ ■, independent of the covertext, and not necessarily a binary symmetric 
source (BSS) as in |Hj and [0]. Specifically, we now assume that the encoder has three inputs 
(see Fig. ^): The covertext source vector, X n , an independent (watermark) message source 
vector U = (U\, . . . , £/jv), where N may differ from n if the two sources operate in different 
rates, and a secret key (shared also with the legitimate decoder) K n = (K\, . . . , K n ), which, 
for mathematical convenience, is assumed to operate at the same rate as the covertext. It 
is assumed, at this stage, that K n is independent of U N and X n . Now, in addition to 
requirements (i)-(iii), we impose a requirement on the equivocation of the message source 
relative to an eavesdropper that has access to Y n , but not to K n . Specifically, we would 
like the normalized conditional entropy, H(U N \Y n )/N, to exceed a prescribed threshold, h 
(e.g., h = H(U) for perfect secrecy). Our first result is a coding theorem that gives a set of 
necessary and sufficient conditions, in terms of single-letter inequalities, such that a triple 
(D,R c ,h) is achievable, while maintaining reliable reconstruction of U N at the legitimate 
receiver. 

In the second step, we relax the requirement of perfect (or almost perfect) watermark 
reconstruction, and assume that we are willing to tolerate a certain distortion between 
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the watermark message U and its reconstructed version U N , that is, Ed'(U N ,U N ) = 
Yli=i Ed'(Ui, Ui) < ND'. For example, if d! is the Hamming distortion measure then D', 
of course, designates the maximum allowable bit error probability (as opposed to the block 
error probability requirement of [S] and [5]). Also, in this case, it makes sense to im- 
pose a requirement regarding the equivocation of the reconstructed message, U , namely, 
H(U N \Y n )/N > h', for some prescribed constant h'. The rationale is that it is U , not 
U , that is actually conveyed to the legitimate receiver, and hence there is an incentive 
to protect the secrecy of U . We will take into account both equivocation requirements, 
with the understanding that if one of them is superfluous, then the corresponding thresh- 
old (h or h' accordingly) can always be set to zero. Our second result then extends the 
above-mentioned coding theorem to a single-letter characterization of achievable quintuples 
(D,D ! ,R c ,h,h'). As will be seen, this coding theorem gives rise to a threefold separation 
theorem, that separates, without asymptotic loss of optimality, between three stages: rate- 
distortion coding of U , encryption of the compressed bitstream, and finally, embedding the 
resulting encrypted version using the embedding scheme of jH]. The necessary and sufficient 
conditions related to the encryption are completely decoupled from those of the embedding 
and the stegotext compression. 

In the third and last step, we drop the assumption of an attack-free system and we 
assume a given memoryless attack channel, in analogy to Again, referring to Fig. ^ 
it should be understood that the stegotext Y n is stored (or transmitted) in compressed 
form, and that the attacker decompresses Y n before the attack and decompresses after (the 
compression and decompression units are omitted from the figure). As it will turn out, in 
the case of a memoryless attack, there is an interaction between the encryption and the 
embedding, even if the key is still assumed independent of the covertext. In particular, it 
will be interesting to see that the key, in addition to its original role in encryption, serves 
as SI that is available to both encoder and decoder (see Fig. [2J). Also, because of the de- 
pendence between the key and the composite signal, and the fact that the key is available 
to the legitimate decoder as well, it is reasonable to let the compressibility constraint cor- 
respond also to the conditional entropy of Y n given K n , that is, private compression as 
opposed to the previously considered public compression, without the key, which enables 
decompression but not decryption (when these two operations are carried out by different, 
remote units). Accordingly, we will consider both the conditional and the unconditional en- 
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tropies of Y n , i.e., H(Y n )/n < R c and H(Y n \K n )/n < R' c . Our final result then is a coding 
theorem that provides a single-letter characterization of the region of achievable six-tuples 
(D,D ! ,R c ,R' c ,h,h'). Interestingly, this characterization remains essentially unaltered even 
if there is dependence between the key and the covertext, which is a reasonable thing to 
have once the key and the stegotext interact in the first place. 1 In this context, the sys- 
tem designer confronts an interesting dilemma regarding the desirable degree of statistical 
dependence between the key and the covertext, which affects the dependence between the 
key and the stegotext. On the one hand, strong dependence can reduce the entropy of 
Y n given K n (and thereby reduce R' c ), and can also help in the embedding process: For 
example, the extreme case of K n = X n (which corresponds to private watermarking since 
the decoder actually has access to the covertext) is particularly interesting because in this 
case, for the encryption key, there is no need for any external resources of randomness, in 
addition to the randomness of the covertext that is already available. On the other hand, 
when there is strong dependence between K n and Y n , the secrecy of the watermark might 
be sacrificed since H(K n \Y n ) decreases as well. An interesting point, in this context, is that 
the Slepian-Wolf encoder ^3] (see Fig. I^J) is used to generate, from K n , random bits that 
are essentially independent of Y n (as Y n is generated only after the encryption). These 
aspects will be seen in detail in Section 4, and even more so, in Section 6. 

The remaining parts of this paper are organized as follows: In Section 2, we set some 
notation conventions. Section 3 will be devoted to a formal problem description and to 
the presentation of the main result for the attack-free case with distortion-free watermark 
reconstruction (first step described above). In Section 4, the setup and the results will 
be extended along the lines of the second and the third steps, detailed above, i.e., a given 
distortion level in the watermark reconstruction and the incorporation of an attack channel. 
Finally, Sections 5 and 6 will be devoted to the proof of the last (and most general) version 
of the coding theorem, with Section 5 focusing on the converse part, and Section 6 - on the 
direct part. 

1 In fact, the choice of the conditional distribution P(K n \X n ) is a degree of freedom that can be optimized 
subject to the given randomness resources. 
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2 Notation Conventions 



We begin by establishing some notation conventions. Throughout this paper, scalar random 
variables (RV's) will be denoted by capital letters, their sample values will be denoted by 
the respective lower case letters, and their alphabets will be denoted by the respective 
calligraphic letters. A similar convention will apply to random vectors and their sample 
values, which will be denoted with same symbols superscripted by the dimension. Thus, 
for example, A (I - positive integer) will denote a random ^-vector (A±, Ag), and a = 
(ai,...,ai) is a specific vector value in A , the £-th Cartesian power of A. The notations 
a\ and A\, where i and j are integers and i < j, will designate segments (aj, . . . ,aj) and 
(Ai, . . . , Aj), respectively, where for i = 1, the subscript will be omitted (as above). For 
i > j, a\ (or A\) will be understood as the null string. Sequences without specifying indices 
are denoted by {•}. 

Sources and channels will be denoted generically by the letter P, or Q, subscripted by 
the name of the RV and its conditioning, if applicable, e.g., Pu(u) is the probability function 
of U at the point U = u, Pk\x(M x ) 1S the conditional probability of K = k given X = x, 
and so on. Whenever clear from the context, these subscripts will be omitted. Information 
theoretic quantities like entropies and mutual informations will be denoted following the 
usual conventions of the information theory literature, e.g., H(U N ), I(X n ;Y n ), and so on. 
For single-letter information quantities (i.e., when n = 1 or N = 1), subscripts will be 
omitted, e.g., i^C/ 1 ) = #(£/i) will be denoted by H(U), similarly, /(X 1 ;^ 1 ) = /(Xi;Yi) 
will be denoted by I(X; Y), and so on. 

3 Problem Definition and Main Result for Step 1 

We now turn to the formal description of the model and the problem setting for step 1, as 
described in the Introduction. A source Px, henceforth referred to as the covertext source or 
the host source, generates a sequence of independent copies, {X t } ( £L_ OD , of a finite-alphabet 
RV, lei At the same time and independently, another source Pjj, henceforth referred 
to as the message source, or the watermark source, generates a sequence of independent 
copies, {Ui}^L_ OQ , of a finite-alphabet RV, U € 11. The relative rate between the message 
source and the covertext source is A message symbols per covertext symbol. This means 
that while the covertext source generates a block of n symbols, say, X n = (X\, . . . ,X n ), 
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the message source generates a block of N = Xn symbols, U = (Ui, ... ,Un) (assuming, 
without essential loss of generality, that An is a positive integer). In addition to the covertext 
source and the message source, yet another source, Pk, henceforth referred to as the key 
source, generates a sequence of independent copies, {Kt}^_ 00 , of a finite-alphabet RV, 
K £ /C, independently 2 of both {X t } and {Ui}. The key source is assumed to operate at 
the same rate as the covertext source, that is, while the covertext source generates the block 
X n of length n, the key source generates a block of n symbols as well, K n = (K\, . . . , K n ). 

Given n and A, a block code for joint watermarking, encryption, and compression is 
a mapping f n : U N x X n x KJ l — > y n , N = An, whose output y n = (yi,...,y n ) = 
f n (u N ,x n ,k n ) G y n is referred to as the stegotext or the composite signal, and accord- 
ingly, the finite alphabet y is referred to as the stegotext alphabet. Let d : X x y ^ IR + 
denote a single-letter distortion measure between covertext symbols and stegotext symbols, 
and let the distortion between the vectors, x n £ X n and y n £ y n , be defined additively 
across the corresponding components, as usual. 

An (n, A, D, R c , h, S) code is a block code for joint watermarking, encryption, and com- 
pression, with parameters n and A, that satisfies the following requirements: 

1. The expected distortion between the covertext and the stegotext satisfies 

n 

J2Ed(X t ,Y t )<nD. (1) 

2. The entropy of the stegotext satisfies 

H(Y n ) < nR c . (2) 

3. The equivocation of the message source satisfies 

H(U N \Y n ) > Nh. (3) 

4. There exists a decoder g n : y n x K, n — > U N such that 

P e ^Pv{g n (Y n ,K n )^U N }<6. (4) 

For a given A, a triple (D, R c , h) is said to be achievable if for every e > 0, there is a 

sufficiently large n for which (n, A, D + e, i? c + e, /i — e, e) codes exist. The achievable region 

2 The assumption of independence between {K t } and is temporary and made now primarily for the 

sake of simplicity of the exposition. It will be dropped later on. 
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of triples (D, R c , h) is the set of all achievable triples (D, R c , h). For simplicity, it is assumed 3 
that H(K) < XH(U) as this upper limit on H(K) suffices to achieve perfect secrecy. 
Our first coding theorem is the following: 

Theorem 1 A triple (D, R c , h) is achievable if and only if the following conditions are both 
satisfied: 

(a) h < H(K)/X. 

(b) There exists a channel {Py\x(u\x), x G X, y E 3^} such that: (i) H(Y\X) > XH(U), 
(ii) R c > XH(U) + I(X;Y), and (Hi) D > Ed(X,Y). 

As can be seen, the encryption, on the one hand, and the embedding and the com- 
pression, on the other hand, do not interact at all in this theorem. There is a complete 
decoupling between them: While condition (a) refers solely to the key and the secrecy of 
the watermark, condition (b) is only about the embedding-compression part, and it is a 
replica of the conditions of the coding theorem in j^j, where the role of the embedding rate, 
R e (see Introduction above), is played by the product XH(U). This suggests a very simple 
separation principle, telling that in order to attain a given achievable triple (D, R c , h), first 
compress the watermark U N to its entropy, then encrypt Nh bits (out of the NH(U)) of 
the compressed bit-string (by bit-by-bit XORing with the same number of compressed key 
bits), and finally, embed this partially encrypted compressed bit-string into the covertext, 
using the coding theorem of |H] (again, see the Introduction above for a brief description of 
this). 

4 Extensions to Steps 2 and 3 

Moving on to Step 2, we now relax requirement no. 4 in the above definition of an (n, X, D, R c ,h, 5) 
code, and allow a certain distortion between U N and its reconstruction U N at the legit- 
imate decoder. More precisely, let U denote a finite alphabet, henceforth referred to as 
the message reconstruction alphabet. Let d' : U x W — > IR + denote a single-letter distor- 
tion measure between message symbols and message reconstruction symbols, and let the 

distortion between vectors u N G U N and u N G U N be again, defined additively across the 

3 At the end of Section 4 (after Theorem 4), we discuss the case where this limitation (or its analogue in 
lossy reconstruction of U ) is dropped. 
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corresponding components. Finally, let Ru(D') denote the rate-distortion function of the 
source P\j w.r.t. d', i.e., 

Ru(D') = min{/([/; U) : Ed'{U, U) < D'}. (5) 

It will now be assumed that H{K) < XRjj(D'), for the same reasoning as before. 

Requirement no. 4 is now replaced by the following requirement: There exists a decoder 
g n : y n x K n -» U N such that U N = (U U ..., U N ) = g„{Y n , K n ) satisfies: 

N 

J2Ed'(Ui,Ui)<ND'. (6) 

i=l 

In addition to this modification of requirement no. 4, we add, to requirement no. 3, a 
specification regarding the minimum allowed equivocation w.r.t. the reconstructed message: 

H{U N \Y n ) > Nti, (7) 

in order to guarantee that the secrecy of the reconstructed message is also secure enough. 
Accordingly, we modify the above definition of a block code as follows: An (n, A, D, D', R c , h, h') 
code is a block code for joint watermarking, encryption, and compression with parameters 
n and A that satisfies requirements 1-4, with the above modifications of requirements 3 and 
4. For a given A, a quintuple (D, D' , R c ,h,h') is said to be achievable if for every e > 0, 
there is a sufficiently large n for which (n, A, D + e, D' + e, R c + e, h — e, h! — e) codes exist. 
Our second theorem extends Theorem 1 to this setting: 

Theorem 2 A quintuple (D, D', R c , h, h') is achievable if and only if the following condi- 
tions are all satisfied: 

(a) h < H{K)/X + if (17) - Ru(D'). 

(b) h! < H(K)/X. 

(c) There exists a channel {P Y \x(y\x), x £ X, y £ y} such that: (i) XRu(D') < H(Y\X), 
(ii) R c > XRu{D')+I{X;Y), and (Hi) D > Ed(X,Y). 

As can be seen, the passage from Theorem 1 to Theorem 2 includes the following modifi- 
cations: In condition (c), H{U) is simply replaced by Ru(D') as expected. This means that 
the lossless compression code of U N , in the achievability of Theorem 1, is now replaced by a 
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rate-distortion code for distortion level D'. Conditions (a) and (b) now tell us that the key 
rate (in terms of entropy) should be sufficiently large to satisfy both equivocation require- 
ments. Note that the condition regarding the equivocation w.r.t. the clean message source 
is softer than in Theorem 1 as H(U) — Rjj(D') > 0. This is because the rate-distortion 
code for U already introduces an uncertainty of H(U) — Rjj(D') bits per symbol, and so, 
the encryption should only complete it to the desired level of h bits per symbol. This point 
is discussed in depth in Of course, by setting D' = (and hence also h! = h), we are 
back to Theorem 1. 

We also observe that the encryption and the embedding are still decoupled in Theorem 
2, and that an achievable quintuple can still be attained by separation: First, apply a rate- 
distortion code to U , as mentioned earlier, then encrypt N ■ max{h + Ru(D') — H(U), h'} 
bits of the compressed codeword (to satisfy both equivocation requirements), and finally, 
embed the (partially) encrypted codeword into X n , again, by using the scheme of [S]. Note 
that without the encryption and without requirement no. 2 of the compressibility of Y n , 
this separation principle is a special case of the one in JO]; where a separation theorem 
was established for the Wyner-Ziv source (with SI correlated to the source at the decoder) 
and the Gel'fand-Pinsker channel (with channel SI at the encoder). Here, there is no SI 
correlated to the source and the role of channel SI is fulfilled by the covertext. Thus, the 
new observation here is that the separation theorem continues to hold in the presence of 
encryption and requirement no. 2. 

Finally, we turn to step 3, of including an attack channel (see Fig. Let Z be a finite 
alphabet, henceforth referred to as the forgery alphabet, and let {Pz\y{ z \u)-> V ^ iV, z £ Z} 
denote a set of conditional PMF's from the stegotext alphabet to the forgery alphabet. We 
now assume that the stegotext vector is subjected to an attack modelled by the memoryless 
channel, 

n 

P z ^{z n \y n ) = \{Pz\Y{zt\yt)- (8) 

t=i 

The output Z n of the attack channel will henceforth be referred to as the forgery. 

It is now assumed and that the legitimate decoder has access to Z n , rather than Y n (in 
addition, of course, to K n ). Thus, in requirement no. 4, the decoder is redefined again, this 
time, as a mapping g n : Z n x /C n — > U N such that = g n (Z n , K n ) satisfies the distortion 
constraint ©. As for the equivocation requirements, the conditioning will now be on both 
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Y n and Z n , i.e., 

H(U N \Y n ,Z n )> Nh and H(U N \Y n , Z n ) > Nti, (9) 

as if the attacker and the eavesdropper are the same party (or if they cooperate), then s/he 
may access both. In fact, for the equivocation of U N , the conditioning on Z n is immaterial 
since U N — > Y n — > Z n is always a Markov chain, but it is not clear that Z n is superfluous 
for the equivocation w.r.t. U N since Z n is one of the inputs to the decoder whose output 
is U N . Nonetheless, for the sake of uniformity and convenience (in the proof), we keep the 
conditioning on Z n in both equivocation criteria. 

Redefining block codes and achievable quintuples (D,D',Rc,h,h') according to the 
modified requirements in the same spirit, we now have the following coding theorem, which 
is substantially different from Theorems 1 and 2: 

Theorem 3 A quintuple (D, D' , R c , h, h!) is achievable if and only if there exist RV's V 
and Y such that PRXVYz(k, x, v, y, z) = Px(x)PK(k)PvY\Kx(v, y\k, x)Pz\y (z\y), where the 
alphabet size ofV is bounded by \V\ < |/C| - - \y\ + 1, and such that the following conditions 
are all satisfied: 

(a) h < H{K\Y)/\ + H(U) - Ru(D'). 

(b) h! < H(K\Y)/X. 

(c) XRu(D') < I(V; Z\K) - I(V; X\K). 

(d) R c > XRu(D') + I(X;Y,V\K) + I[K;Y). 

(e) D > Ed{X,Y). 

First, observe that here, unlike in Theorems 1 and 2, it is no longer true that the 
encryption and the embedding (along with stegotext compression) are decoupled, yet the 
rate-distortion compression of £7^ is still separate and decoupled from both. In other words, 
the separation principle applies here in partial manner only. Note that now, although K 
is still assumed independent of X, it may, in general, depend on Y. On the negative side, 
this dependence causes a reduction in the equivocation of both the message source and 
its reconstruction, and therefore H(K\Y) replaces H(K) in conditions (a) and (b). On 
the positive side, on the other hand, this dependence introduces new degrees of freedom 
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in enhancing the tradeoffs between the embedding performance (condition (c)) and the 
compressibility (condition (d)). 

The achievability of Theorem 3 involves essentially the same stages as before (rate- 
distortion coding of U , followed by encryption, followed in turn by embedding), but this 
time, the embedding scheme is a conditional version of the one proposed in ,{JJ, where all 
codebooks depend on K n , the SI given at both ends (see Fig. |2J). An interesting point 
regarding the encryption is that one needs to generate, from K n , essentially nH{K\Y) 
random bits that are independent of Y n (and Z n ), in order to protect the secrecy against an 
eavesdropper that observes Y n and Z n . Clearly, if Y n was given in advance to the encrypting 
unit, then the compressed bitstring of an optimal lossless source code that compresses K n , 
given Y n as SI, would have this property (as if there was any dependence, then this bitstring 
could have been further compressed, which is a contradiction). However, such a source code 
cannot be implemented since Y n itself is generated from the encrypted message, i.e., after 
the encryption. In other words, this would have required a circular mechanism, which may 
not be feasible. A simple remedy is then to use a Slepian-Wolf encoder JH]) that generates 
nH(K\Y) bits that are essentially independent of Y n (due to the same consideration), 
without the need to access the vector Y n to be generated. For more details, the reader is 
referred to the proof of the direct part (Section 6). 

Observe that in the absence of attack (i.e., Z = Y), Theorem 2 is obtained as a special 
case of Theorem 3 by choosing V = Y and letting both be independent of K, a choice 
which is simultaneously the best for conditions (a)-(d) of Theorem 3. To see this, note the 
following simple inequalities: In conditions (a) and (b), H(K\Y) < H{K). In condition (c), 
by setting Z = Y, we have 

I(V;Y\K)-I(V;X\K) < I(V; X,Y\K) - I(V; X\K) 

= I(V;Y\X,K) 

< H{Y\X,K) 

< H{Y\X). (10) 

Finally in condition (d), clearly, I(K;Y) > and since X is independent of K, then 
I(X;Y,V\K) = I(X;Y,V,K) > I(X;Y). Thus, for Z = Y, the achievable region of 
Theorem 3 is a subset of the one given in Theorem 2. However, since all these inequalities 
become equalities at the same time by choosing V = Y and letting both be independent of 
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K, the two regions are identical in the attack-free case. 

Returning now to Theorem 3, as we observed, K n is now involved not only in the role of 
a cipher key, but also as SI available at both encoder and decoder. Two important points 
are now in order, in view of this fact. 

First, one may argue that, actually, there is no real reason to assume that K n is nec- 
essarily independent of X n (see also ^^)- ^ the user nas control of the mechanism of 
generating the key, then s/he might implement, in general, a channel Px"\x n (k n \x n ) using 
the available randomness resources, and taking (partial) advantage of the randomness of 
the covertext. Let us assume that this channel is stationary and memoryless, i.e., 

n 

P K n\ X n{k n \x n ) = Y[P K \ X {k t \x t ) (11) 
t=l 

with the single-letter transition probabilities {Pk\x{^\x) x € X, A; € JC} left as a degree 
of freedom for design. While so far, we assumed that K was independent of X, the other 
extreme is, of course, K = X (corresponding to private watermarking). Note, however, 
that in the attack-free case, in the absence of the compressibility requirement no. 2 (say, 
R c = oo), no optimality is lost by assuming that K is independent of X, since the only 
inequality where we have used the independence assumption, in the previous paragraph, 
corresponds to condition (d). 

The second point is that in Theorems 1-3, so far, we have defined the compressibility of 
the stegotext in terms of H(Y n ), which is suitable when the decompression of Y n is public, 
i.e., without access to K n . The legitimate decoder in our model, on the other hand, has 
access to the SI K n , which may depend on Y n . In this context, it then makes sense to 
measure the compressibility of the stegotext also in a private regime, i.e., in terms of the 
conditional entropy, H(Y n \K n ). 

Our last (and most general) version of the coding theorem below takes these two points 
in to account. Specifically, let us impose, in requirement no. 2, an additional inequality, 

H(Y n \K n ) < nR' c , (12) 

where R' c is a prescribed constant, and let us redefine accordingly the block codes and the 
achievable region in terms of six-tuples (D,D',R c ,R' c ,h,h r ). We now have the following 
result: 

Theorem 4 A six-tuple (D, D', R c , R' c , h, h') is achievable if and only if there exist RV's V 
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and Y such that PRXVYz(k, x,v,y, z) = PxK(x,k)P V Y\Kx( v >y\k,x)Pz\Y( z \y), where the 
alphabet size ofV is bounded by \V\ < \JC\-\X\-\y\ + l, and such that the following conditions 
are all satisfied: 

(a) h < H(K\Y)/X + H(U) - Ru(D'). 

(b) h! < H(K\Y)/X. 

(c) XRu(D') < I(V; Z\K) - I(V; X\K). 

(d) R c > XRu(D') + I(X;Y,V\K) + I(K;Y). 

(e) R' c > XRu(D') + I(X;Y,V\K). 
(!) D>Ed(X,Y). 

Note that the additional condition, (e), is similar to condition (d) except for the term 
I{K;Y). Also, in the joint PMF of (K, X,V,Y, Z) we are no longer assuming that K and 
X are independent. It should be pointed out that in the presence of the new requirement 
regarding H(Y n \K n ), it is more clear now that introducing dependence of (V,Y) upon 
K is reasonable, in general. In the case K = X, that was mentioned earlier, the term 
I(V; X\K), in condition (c), and the term I(X;Y, V\K), in conditions (d) and (e), both 
vanish. Thus, both embedding performance and compression performance improve, like in 
private watermarking. 

Finally, a comment is in order regarding the assumption H(K) < XRu(D'), which 
implies that H(K\Y) cannot exceed XRjj(D') either. If this assumption is removed, and 
even H{K\Y) is allowed to exceed XRu(D'), then Theorem 4 can be somewhat further 
extended. While h cannot be further improved if H(K\Y) is allowed to exceed XRjj(D') 
(as it already reaches the maximum possible value, h = H(U), for H(K\Y) = XRjj(D')), it 
turns out that there is still room for improvement in h' . Suppose that instead of one rate- 
distortion codebook for U N , we have many disjoint codebooks. In fact, it has been shown 
in [H] that there are exponentially 2 NH ^ U ' disjoint codebooks, each covering the set of 
typical source sequences by jointly typical codewords. Now, if H(K\Y) > XRjj(D'), we can 
use the T = nH(K\Y) — NRu(D') excess bits of the compressed key (beyond the NRu(D') 
bits that are used to encrypt the binary of representation of U ), so as to select one of 2 T 
codebooks (as long as T < NH(U\U)), and thus reach a total equivocation of nH(K\Y) as 
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long as nH(K\Y) < NH(U), or equivalently, H(K\Y) < XH(U). The equivocation level 
b! = H{U) is now the "saturation value" that cannot be further improved (in analogy to 
h = H(U) for the original source). This means that condition (b) of Theorem 4 would now 
be replaced by the condition 

ti < min{H(U), H{K\Y)/X). (13) 

But with this condition, it is no longer clear that the best test channel for lossy compression 
of U N is the one that achieves Ru(D'), because for the above modified version of condition 
(b), it would be best to have H(U) as large as possible (as long as it is below H(K\Y)/X), 
which is in partial conflict with the minimization of I(U; U) that leads to Ru{D r ). Therefore, 
a restatement of Theorem 4 would require the existence of a channel {P^p(u\u), u €lf, u <G 
U} (in addition to the existing requirement of a channel Pvy\kx)i suc h that the random 
variable U takes now part in the compromise among all criteria of the problem. This means 
that in conditions (a),(c),(d), and (e) of Theorem 4, Rjj(D') should be replaced by I(U; U), 
and there would be an additional condition (g): Ed'(U, U) < D' . Condition (a), in view of 
the earlier discussion above, would now be of the form: 

h < mm{H(U), H(K\Y)/X + H(U) - I(U; U)} = H(U) - [I{U; U) - H(K\Y)/X]+, (14) 

where [z]+ = max{0, z}. Of course, under the assumption H(K) < XRu{D'), that we have 
used thus far, 

H(U) > I{U; U) > Ru(D') > H{K)/X > H(K\Y)/X, (15) 

in other words, mm{H(U), H(K\Y)/X} is always attained by H(K\Y)/X, and so, the depen- 
dence on H(U) disappears, which means that the best choice of U (for all other conditions) 
is back to be the one that minimizes I(U; U), which gives us Theorem 4 as is. 

It is interesting to point out that this additional extension gives rise to yet another 
step in the direction of invalidating the separation principle: While in Theorem 4 only the 
encryption and the embedding interacted, yet the rate-distortion coding of U N was still 
independent of all other ingredients of the system, here even this is no longer true, as the 
choice of the test channel P(jp takes into account also compromises that are associated 
with the encryption and the embedding. 

Note that this discussion applies also to the classical joint source-channel coding, where 
there is no embedding at all: In this case, X is a degenerate RV (say, X = 0, if G X), 
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and so, the mutual information terms depending on X in conditions (c), (d) and (e), all 
vanish, the best choice of V is V = Y (thus, the r.h.s in condition (c) becomes the capacity 
of the channel Pz\y with K as SI at both ends), and condition (f) may be interpreted as 
a (generalized) power constraint (with power function <f)(y) = d(0,y)). Nonetheless, the 
new versions of conditions (a) and (b) remain the same as in eqs. ()13|) and (|14|). This is 
to say that the violation of the separation principle occurs even in the classical model of a 
communication system, once security becomes an issue and one is interested in the security 
of the reconstructed source. 

5 Proof of the Converse Part of Theorem 4 

Let an (n, A, D + e, D' + e, R c + e, R' c + e, h — e, h' — e) code be given. First, from the 
requirement H(Y n \K n ) < n(R' c + e), we have: 

n(R' c + e) > H{Y n \K n ) (16) 

= H(Y n \U N ,K n ) + I(U N ;Y n \K n ) 

> H{Y n \U N ,K n ) + I{U N ;Z n \K n ) 

= H{Y n \U N ,K n ) + I(U N ;Z n ,K n ) (17) 

where the second inequality comes from the data processing theorem ([/ — > Y n — > Z n is a 
Markov chain given K n ) and the last equality comes from the chain rule and the fact that 
U and K n are independent. Define Vt = {X™ +1 , U N , K 1 ^ 1 , Z f ~ l ), J - as a uniform RV 
over {l,...,n}, X = Xj, K = Kj, Y = Yj, V = Vj, and V = (Vj, J) = (V, J). Now, 
the first term on the right-most side of eq. (|17|) is further lower bounded in the following 
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manner. 



H(Y n \U N ,K n ) > I(X n ;Y n \U N ,K n ) 

r-n.-w-n ttN jsn\ tIV^.ttN j^n 



/(x n ; y n ,[/ JV ,K n ) -i{x n ;Xj l \K % 

n 

J2l(X t ;Y n ,U N ,K n \X? +1 )-I(X n ;K n ) (18) 



t=i 



= J2l(X t ;Y n ,U N ,K n ,X? +1 )-nI(X;K) (19) 
t=i 

n 

> '£l{X t] K t ,Y t ,U N ,K t - 1 ,Z t - 1 ,Xhi)-nI{X;K) (20) 



t=i 
n 



= J2l(X t ;K t ,Y t ,V t )-nI(X;K) 
t=i 

= n[I(X;K,Y,V'\J)-I(X;K)] 

= n[I(X;K,Y,V',J)-I(X;K)} (21) 
= nI(X-Y,V\K) (22) 

where (j!8|l is due to the chain rule and fact that (X n ,K n ) is independent of U N (hence 
U — > K n — > X n is trivially a Markov chain), ()19|) is due to the memorylessness of 
{(X t , K t )}, (|2U|) is due to the data processing theorem, and (|21jl follows from the fact 
that {Xt} is stationary and so, X = Xj is independent of J. The second term on the 
right -most side of eq. (|17|) is in turn lower bounded following essentially the same ideas as 
in the proof of the converse to the rate-distortion coding theorem (see, e.g., [3]): 

I(U N ;Z n ,K n ) = H{U N )-H{U N \Z n ,K n ) 

N 

= ^[^(^-F^l^- 1 ,^,^)] 
i=l 
N 

= J2i(u i -,u i -\z n : K n ) 

i=l 
N 

i=l 
N 

> Y, R u( E d'(u l ,[gn{z n ,K n )\ i )) 

1=1 

> NRu(D' + e), (23) 
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where [g n (Z n , K n )\i denotes the 2-th component projection of g n (Z n , K n ), i.e., Ui as a 
function of (Z n ,K n ). Combining eqs. (QIJ), <0, and ©, we get 

n(i?' c + e) > NRu(D' + e) + nJ(X; Y, (24) 

Dividing by n, we get 

^ + e > ^Ru(D' + e) +I(X;Y,V\K). (25) 

Using the arbitrariness of e together with the continuity of Ru(-), we get condition (e) of 
Theorem 4. 

Condition (d) is derived in the very same manner except that the starting point is the 
inequality n(R c + e) > H(Y n ), and when H(Y n ) is further bounded from below, in analogy 
to the chain of inequalities (|17jl. there is an additional term, I(K n ;Y n ), that is in turn 
lower bounded in the following manner: 

rt 

I{K n -Y n ) > ^I(K t ;Y t ) 
t=\ 

= nI(K;Y\J) 

= n[H(K\J) - H(K\J,Y)} 

> n[H(K) — H(K\Y)} 

= nI(K;Y), (26) 

where the first inequality is because of the memorylessness of {Kt}, and the second inequal- 
ity comes from the facts that conditioning reduces entropy (in the second term) and that 
K is independent of J (again, due to the stationarity of {Kt}). This gives the additional 
term, I(K;Y), in condition (d). 

Condition (c) is obtained as follows: 

NRu{D' + e) < I(U N ;K n ,Z n ) 

= I(U N ;K n ,Z n )-I{U N ;K n ,X n ) 

n 

< ^[I(y t ;K t ,Zt)-I(Vt-,K t ,X t )] (27) 
t=l 

= n[I(V; K, Z\J) - I(V'\ K, X\J)\ 

< n[I(V, J;K,Z) -I(V, J;K,X)} (28) 
= n[I(V;K,Z)-I(V;K,X)} 

= n[I(V;Z\K)-I(V;X\K)], (29) 
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where the first inequality is ()23j) . the first equality is due to the independence between U N 
and (K n ,X n ), the second inequality is an application of Lemma 4], the third inequality 
is due to the fact that I(K, Z; J) > and I(K, X; J) = (due to the stationarity of 
{(K tl X t )}), and the last equality is obtained by adding and subtracting I(V; K). Again, 
since this is true for every e > 0, it holds also for e = 0, due to continuity. 
As for condition (f), we have: 

1 n 

D + e>- z ^Ed(X t ,Y t )=Ed(X,Y), (30) 
t=i 

and we use once again the arbitrariness of e. Regarding condition (b), we have: 
nH(K\Y) > nH(K\Y,J) 

n 



t=i 



> ^2H{K t \K l - l ,Y n ) 
t=i 

= H(K n \Y n ) 

= H(K n \Y n ,Z n ) 

> I(K n ;U N \Y n ,Z n ) 

= H(U N \Y n ,Z n )-H(U N \Y n ,Z n ,K n ) 

= H(U N \Y n ,Z n ) 

> N(h'-e), (31) 

where the last equality is due to the fact that i/^ is, by definition, a function of (Z n , K n ), 
and the last inequality is by the hypothesis that the code achieves an equivocation of at 
least N(h' - e). Dividing by N and taking the limit e -> 0, leads to ti < H(K\Y)/\, 
which is condition (b). Finally, to prove condition (a), consider the inequality nH{K\Y) > 
H(U N \Y n , Z n ), that we have just proved, and proceed as follows (see also |15j): 

nH{K\Y) > H(U N \Y n ,Z n ) 

> H(U N \Y n ,Z n ) + N(h-e)-H(U N \Y n ,Z n ) 
= N(h-e)-H(U N ) + I(U N ;Y n ,Z n )- 

I{U N - Y n , Z n ) + I(U N ; U N ) + H{U N \U N ) 

> N[h-e- H(U)+Ru(D' + e)] + 

[I(U N ; Y n , Z n ) - I(U N ; Y n , Z n ) + H(U N \U N )], (32) 
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where the second inequality follows from the hypothesis that the code satisfies H(U N \Y n , Z n ) > 
N{h — e), and the third inequality is due to the memorylessness of {Ui}, the hypothesis 
that 5^=1 Ed'(Ui, Ui) < N(D' + e), and the converse to the rate-distortion coding theorem. 
Now, to see that the second bracketed term is non-negative, we have the following chain of 
inequalities: 





I(U N ;Y n ,Z n ] 


)-I(U N ; 


Y n ,Z n ) + H(U N 


\u N : 


) 




I(U N ;Y n ,Z n ] 


) - H(Y n . 


,Z n ) + H(Y n ,Z n 


\U N ) 


> + H{U N \U N ) 


> 


I(U N ;Y n ,Z n ] 


) - H(Y n . 


,Z n ) + H(Y n ,Z n 


\u N , 


U N )+H(U N \U N 




I(U N ;Y n ,Z n ] 


) - H(Y n . 


t Z n ) + H(Y n ,Z n . 


,U N 


\U N ) 
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I(U N ;Y n ,Z n ] 
0. 


) - H(Y n . 


,Z n ) + H(Y n ,Z n 


\U N ) 




Combining this with eq. (|32j). 


we have 









(33) 



nH(K\Y) >N[h-e- H(U) + R V {D' + e)]. (34) 

Dividing again by N, and letting e vanish, we obtain h < H(K\Y)/X + H(U) — Ru(D'), 
which completes the proof of condition (a). 

To complete the proof of the converse part, it remains to show that the alphabet size 
of V can be reduced to \JC\ ■ \X\ ■ \y\ + 1. To this end, we extend the proof of the parallel 
argument in ;9, by using the support lemma (cf. [I]), which is based on Caratheodory's 
theorem. According to this lemma, given J real valued continuous functionals fj,j = 1, J 
on the set of probability distributions over the alphabets X , and given any probability 

measure fi on the Borel u-algebra of V(X), there exist J elements Q\, Qj of V{X) and 
J non-negative reals, a\,...,aj, such that Y2j=i a j = 1 an d for every j = 1, J 

[ f j (Q)n(dQ)=J2a i f j (Q i ). (35) 

Before we actually apply the support lemma, we first rewrite the relevant mutual informa- 
tions of Theorem 4 in a more convenient form for the use of this lemma. First, observe 
that 

I{V;Z\K) - I(V;X\K) = H{Z\K) — H(Z\V, K) - H(X\K) + H(X\V, K) 

= H{Z\K) - H{X\K) + H(K,X\V) - H(K,Z\V). (36) 
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and 

I(X;Y,V\K) = I(X;V\K)+I(X;Y\V,K) (37) 

= H(X\K) - H(X\V, K) + H(X\V, K) - H(X\V, Y, K) 

= H(X\K)-H(X\V,Y,K) 

= H(X\K)-H(K,X,Y\V) + H(K,Y\V). (38) 

For a given joint distribution of (K,X,Y), and given Pz\Yi H(Z\K) and H(X\K) are both 
given and unaffected by V. Therefore, in order to preserve prescribed values of I(V; Z\K) — 
I(V;X\K) and I(X; V,Y\K), it is sufficient to preserve the associated values H (K, X\V) — 
H(K,Z\V) and H(K,X,Y\V) - H(K,Y\V). Let us define then the following functionals 
of a generic distribution Q over K, x X x y, where AC x X x y is assumed, without loss of 
generality, to be {1, 2, m}, m = |/C| ■ \X\ ■ \y\: 

fi(Q) = Q{k,x,y), i = {k,x,y) = l,...,m - 1 (39) 
/m(Q) = 2^ 2/) 2^ p z|y(^|y) log * 3 . (40) 



/m+l(Q)= £Q(M,V)log-^^. ( 41 ) 



Next define 

f . , (Ci\ — \" Oft ^ ,/> lr.tr _ 

k,x,y Q(k,X,yY 

Applying now the support lemma, we find that there exists a random variable V (jointly 
distributed with (K,X,Y)), whose alphabet size is |V| = m + 1 = \JC\ ■ \X\ ■ \y\ + 1 and it 
satisfies simultaneously: 

Y,MV = v}fi(P(-\v))=P KX Y(k,x,y), i = l,..,m-l, (42) 
^ Pr{y = u}/ m (P(») = H(K, X\V) - H(K, Z\V), (43) 

V 

and 

^Pr{y = u }/ m+1 (P(-| v )) = J F/(^,x,y|F)- J F/(K,y|y). (44) 

u 

It should be pointed out that this random variable maintains the prescribed distortion 
level Ed(X,Y) since Pxy is preserved. By the same token, H(K\Y) and I(K;Y), which 
depend only on Pry, are preserved as well. This completes the proof of the converse part 
of Theorem 4. 
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6 Proof of the Direct Part of Theorem 4 



In this section, we show that if there exist RV's (V, Y) that satisfy the conditions of Theorem 
4, then for every e > 0, there is a sufficiently large n for which (re, A, D+e, D' + e, R c + e, R'^ 
e, h — e, h! — e) codes exist. One part of the proof is strongly based on a straightforward 
extension of the proof of the direct part of [§] to the case of additional SI present at both 
encoder and decoder. Nonetheless, for the sake of completeness, the full details are provided 
here. It should be pointed out that for the attack-free case, an analogous extension can 
easily be offered to the direct part of 8 . 

We first digress to establish some additional notation conventions associated with the 
method of types For a given generic finite-alphabet random variable (RV) A € A (or 
a vector of RV's taking on values in A), and a vector a G A (£ - positive integer), the 
empirical probability mass function (EPMF) is a vector P a i = {P a i(a'), a' £ A}, where 
P a t(a') is the relative frequency of the letter a' € A in the vector a . Given 5 > 0, let 
us denote the set of all <5-typical sequences of length £ by T Pa , or by T A (if there is no 
ambiguity regarding the PMF that governs ^4), i.e., T A is the set of the sequences a € A e 
such that 

(1 - 5)P A (a) < P a t{o!) < (1 + S)P A (af) (45) 
for every a' € A. For sufficiently large £, the size of T\ is well-known [1] to be bounded by 

2 £{{l-5)H(A)-8] < | T j| < 2 t(l+5)H(A)^ 

It is also well-known (by the weak law of large numbers) that: 

Pr{A'gr£}<5 (47) 

for all I sufficiently large. For a given generic channel Pb\A^P\°) an d for each c/ G T^, the 
set of all sequences b l that are jointly <5-typical with a £ , will be denoted by Tp g ^ A (a e ), or by 
Tg^(o^) if there is no ambiguity, i.e., T^ A (a e ) is the set of all b S ' such that: 

(1 - 5)PAo!)PB\ A {b'\a') < PM"'J) < (1 + S)P ae (a')P BlA (b'\a'), (48) 



for all a' € A,b' £ B, where P a i b e(a' ,b') denotes the fraction of occurrences of the pair 
(a', b') in ((/, b e ). Similarly as in eq. Q45|) . for all sufficiently large I and £ T A , the size of 
Tg^ A {a ) is bounded as follows: 

2«[(i-«)£r(B|A)-fl < |r|| A (a £ )| < 2 £ ( 1+<5 )^( B I A ). (49) 
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Finally, observe that for all a e G and b e G T^ A (a e ), the distortion d(a £ ,b e ) = Y^j=i d(a,j, bj) 
is upper bounded by: 

d(a e , b e ) < £(1 + 8f PA(a')P B \ A {b'\a')d{a', b') = £{l + 5) 2 Ed(A, B). (50) 

a',b' 

Let (K, X, V, Y, Z) be a given random vector that satisfies the conditions of Theorem 4. 
We now describe the mechanisms of random code selection and the encoding and decoding 
operations. For a given e > 0, fix 5 such that 25 + max{2-exp{— 2 nS } + 2~ nS , 5 2 } < e. Define 
also 

ei = S[l + H(V\K)+H(V\K,X)], (51) 
e 2 ± 6[1 + H(Y\K, V) + H(Y\K, X, V)}, (52) 

and 

e 3 = 5[l + H(V\K)+H(V\Z,K)}. (53) 
Generation of a rate-distortion code: 

Apply the type-covering lemma [1] and construct a rate-distortion codebook that covers 
Ty within distortion N{D' + e) w.r.t. d' , using 2 NRu ^ D ) codewords. 

Generation of the encrypting bitstream: 

For every k n € Tjt, randomly select an index in the set {0, 1, . . . , 2 n ^ H ^ K ^ + ^ —1} with a uni- 
form distribution. Denote by s J (k n ) = (si(k n ), ... , sj(k n )), Sj(k n ) G {0, 1}, j = 1, . . . , J, 
the binary string of length J = n[H(K\Y) + 5] that represents this index. (Note that s J (k n ) 
can be interpreted as the output of the Slepian-Wolf encoder for K n , where Y n plays the 
role of SI at the decoder 

Generation of an auxiliary embedding code: 

We first construct an auxiliary code capable of embedding 2 NRu<yD '^ watermarks by a ran- 
dom selection technique. First, M\ = 2 nRl , Ri = I(V; Z\K) — — sequences {V n (i, k n )}, 
i E {1, . . . , M\}, are drawn independently from Ty^ K (k n ) for every k n <G T|. For every such 
k n , let us denote the set of these sequences by C(k n ). The elements of C(k n ) are evenly 
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distributed among M v = 2 NR u( D ') bins, each of size M 2 = 2 nR \ R 2 = I(X; V\K) + e x + 8 
(this is possible thanks to condition (c) of Theorem 4, provided that the inequality therein 
is strict). A different (encrypted) message of length L = NRjj(D') = nXRjj(D') bits is 
attached to each bin, identifying a sub-code that represents this message. We denote the 
codewords in bin number m (m G {1, 2, . . . , Mjj}), by {V n (m, j, k n )}, j S {1,2,..., M 2 }. 

Stegotext sequence generation: 

For each auxiliary sequence (in the above auxiliary codebook of each 5-typical k n ) , V n (m, j, k n ) = 
v n , a set of M 3 = 2 ni?3 , R 3 = I(X;Y\V,K) + e 2 + 5, stegotext sequences {Y n (f, v n , k n )}, 
j' G {1, . . . ,M^}, are independently drawn from Ty^ VK (v n , k n ). We denote this set by 
C{v n ,k n ). 

Encoding: 

Upon receiving a triple (u N ,x n , k n ), the encoder acts as follows: 

1. If u N G Ty, let w L = (wi, . . . , wl)i Wi G {0,1}, i = 1,...,L be the binary repre- 
sentation of the index of the rate-distortion codeword for the message source. For 
k n G r|-, let s J (k n ) = (si(k n ), . . . , sj(k n )) denote binary representation string of the 
index of k n . Let w L = (wi, . . . ,wl), where Wj = Wj © Sj(k n ), j = 1,..., J, and 
Wj = Wj, j = J + 1, . . . , L, and where © denotes modulo 2 addition i.e., the XOR op- 
eration. 4 The binary vector w L is the (partially) encrypted message to be embedded. 
Let m = Ya=i + 1 denote the index of this message. If u N £ Ty or k n £ T^, 
an arbitrary (error) message w L is generated (say, the all-zero message). 

2. If (k n ,x n ) G Tj CX find, in bin number m, the first j such that V n (m,j,k n ) = v n 
is jointly typical, i.e., (k n ,x n ,v n ) G T^ xv , and then find the first f such that 
Y n (j',v n ,k n ) = y 11 G C{v n ,k n ) is jointly typical, i.e., (k n ,x n ,v n ,y n ) G T S KXVY . 
This vector y n is chosen for transmission. If (k n ,x n ) T^ x , or if there is no 
V n {m,j,k n ) = v n and Y n (f, v n , k n ) = y n such that (k n , x n , v n , y n ) G T 5 KXVY , an 
arbitrary vector y n G y n is transmitted. 

Decoding: 

Upon receiving Z n = z n and K n = k n , the decoder finds all sequences {v n } in C(k n ) such 
4 Note that since H(K) is assumed smaller than XRu(D'), then so is H(K\Y), and therefore J < L. 
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that (k n , v n , z n ) G T 5 KVZ . If all {v 11 } found belong to the same bin, say, m, then m is decoded 
as the embedded message, and then the binary representation vector w L = (wi, . . . ,wl) 
corresponding to rh is decrypted, again, by modulo 2 addition of its first J bits with s J (k n ). 
This decrypted binary L-vector is then mapped to the corresponding reproduction vector 
u N of the rate-distortion codebook for the message source. If there is no v n G C(k n ) such 
that (k n , v n , z n ) G T^ vz or if there exist two or more bins that contain such a sequence, an 
error is declared. 

We now turn to the performance analysis of this code in all relevant aspects. For each 
triple (k n ,x n ,u N ) and particular choices of the codes, the possible causes for incorrect 
watermark decoding are the following: 

1. (k n ,x n ,u N ) T^x x Tfj. Let the probability of this event be defined as P ei . 

2. (k n ,x n ,u N ) G T S KX x Tfj, but in bin no. m there is no v n s.t. (k n ,x n ,v n ) G T 5 KXV . 
Let the probability of this event be defined as P e2 . 

3. (k n ,x n ,u N ) G T S KX x Tfj and in bin no. m there is v n s.t. (k n ,x n ,v n ) G T 5 KXV , but 
there is no y n G C(v n ,k n ) s.t. (k n , x n , v n , y n ) G T & KXVY . Let the probability of this 
event be defined as P e3 . 

4. (k n ,x n ,u N ) G T^ x x and in bin no. m there is v n and y n G C(v n , k n ) such that 
(k n ,x n ,v n ,y n ) G T S KXVY , but (k n ,v n ,z n ) <£ T S KVZ . Let the probability of this event 
be defined as P eA . 

5. (k n ,x n ,u N ) G Tj£x x and in bin no. m there is v n and y n G C(v n , k n ) such that 
(k n , x n , v n , y n ) G Tj£ XV y, and (k n ,v n ,z n ) G T^y^, but there exists another bin, say, 
no. fh, that contains v n s.t. (k n ,v n ,z n ) G 1%-y Z . Let the probability of this event be 
defined as P es . 

If none of these events occur, the message w L (or, equivalently, m) is decoded correctly 
from z n , the distortion constraint between x n and y n is within n{D + e) (as follows from 
()5U|0 . and the distortion between u N and its rate-distortion codeword, u = u , does not 
exceed N{D' + e). Thus, requirements 1 and 4 (modified according to eq. ©, with D' + e 
replacing D 1 ) are both satisfied. Therefore, we first prove that the probability for none of 
the events 1-5 to occur, tends to unity as n — > oo. 
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The average probability of error P e in decoding m is bounded by 

5 



The fact that P ei — ► follows immediately from (|47|). As for P e2 , we have: 

p e2 ^ n pr {(fc n ,z n ,v"(m,j-,p)) ^ T ^ } - (55) 

Now, by (|46|) . for every j and every (k n ,x n ) € Tt- X : 



Pv{V n (m,j,k n )^T^ KX (k n ,x n )} = l-Fr{V n (m,j,k n )eT^ KX (k n ,x n )} 

nS i 
V\KX [ 



1 



< 1 



1^(^)1 

on[(l-5)fl'(V'|K,X)-(5] 



2n(l+<5).H'(V|K) 
1 _ 2 -n[/(^|^)+ ei ]_ (56) 



Substitution of 1)56(1 into (|55|) provides us with the following upper bound: 



1-2 



-n[I(X;V\K)+ei] 



< exp { - 2 nR * ■ 2 - n ^ X ' V \ R ) + ^ \ - 0, (57) 



double-exponentially rapidly since i?2 = ^(-X"; V\K) + e\ + 6. To estimate P e3 , we repeat 
the same technique: 

M 3 

Pe 3 = II Pr{(fc n ,x n ,x; n ,y n 0" ) « n ,fc n )) i Tkxvy}- (58) 
i'=i 



Again, by the property of the typical sequences, for every f and (k n , x n , v n ) S T- 



■S 

KXV 



Pr{Y n (j',v n ,k n ) £ T Y{KXV (k n ,x n ,v n )} < 1 - 2 - n ^ x ^\ V ' K ^\ (59) 

and therefore, substitution of into (|5*5|) gives 

P e3 < [l - 2 -™[J(^y|V^)+e a ]j Ms < exp | _ 2" R 3 . 2 -n[/(X;y|V^) +£a ] | _ Qj (6Q) 



double-exponentially rapidly since i?3 = I(X; Y\V, K)+€2+5. The estimation of P ei is again 
based on properties of typical sequences. Since Z n is the output of a memoryless channel 
Pz\Y with input y n = Y n (j' ,v n ,k n ) and by the assumption of this step (k n , x n , v n , y n ) £ 
T|- xl/ y, from (|47|) and the Markov lemma [HI Lemma 14.8.1], we obtain 

P e4 = Pv{(k n ,x n ,v n ,y n ,Z n ) i T S KXVYZ } < 5, (61) 
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and similarly to P ei , P e4 can be made as small as desired by an appropriate choice of 5. 
Finally, we estimate P e5 as follows: 

P es = Pr{3m^m: (k n ,V n (fh,j,k n ),z n ) eT s KVZ } (62) 

< Pr{(k n ,V n (m,j,k n ),z n )eT KVZ } 

m^m, j'e{l,2,...,M 2 } 

= ^ NRu(D "> - l)2 nR2 Px{{k n ,V n {m,j,k n ),z n ) eT KVZ } 

< 2 nR 12 -n[I(V;Z\K)-e 3 ]_ (53) 

Now, since i?i = I(V; Z\K) — e% — 5, P e5 — > 0. Since P e » — > for « = 1, . . . , 5, their sum 
tends to zero as well, implying that there exist at least one choice of an auxiliary code and 
related stegotext codes that give rise to the reliable decoding of W L . 

Now, let us denote by N c the total number of composite sequences in a codebook that 
corresponds to a <5-typical k n . Then, 

N c = M r ■ M 2 ■ M 3 

= 2 n l XR u( D ')+ I (X;V\K)+I(X;Y\V,K)+e 1 +e 2 +25} 

= 2 n l XR u(D')+UX;Y,V\K)+e 1 +e 2 +2S]_ ^ 

Thus, 

H(Y n \K n ) < \ogN c 

= n[\R u (D') + I(X;Y,V\K) + e 1 + e 2 + 25] 

< n{R' c + ei + e 2 + 25), (65) 

where in the last inequality we have used condition (e). For sufficiently small values of 5 
(and hence of e± and e 2 ) €\ + €2 + 25 < e and so, the compressibility requirement in the 
presence of K n is satisfied. 

We next prove the achievability of R c . Let us consider the set of 5-typical key sequences 
Tr-, and view it as the union of 0-typical sets (i.e., <5-typical sets with 5 = 0), {Tq k }, where 
Qk exhausts the set of all rational PMF's with denominator n, and with the property 

(l-5)P K (k) <Q K (k) <(l + 5)P K (k), VkelC. (66) 

Suppose that we have already randomly selected a codebook for one representative member 
k of each type class Tq k C T k using the mechanism described above. Now, consider the 
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set of all permutations from k n to every other member of TR. The auxiliary codebook 
and the stegotext codebooks for every other key sequence, k n G Tq k will be obtained by 
permuting all (auxiliary and stegotext) codewords of those corresponding to k n according 
to the same permutation that leads from k n to k n (thus preserving all the necessary joint 
typicality properties). Now, in the union of all stegotext codebooks, corresponding to all 
typical key sequences, each codeword will appear at least (n + l) - !^!'!^ • 2 n [( 1-<5 W A 'l^')~ <5 ] 
times, which is a lower bound to the number of permutations of k which leave a given 
stegotext codeword y n unaltered. The total number of stegotext codewords, Ny, in all 
codebooks of all 5-typical key sequences (including repetitions) is upper bounded by 

N Y = \T 5 K \-N C 

< 2 n lO-+ S ) H ( K )+ S \ . 2 n i XR u( D ')+ I (X;Yy\K)+e 1 +e 2 +2S} 

= 2 n l H ( K )+ XR u(D')+I{X;Y,V\K)+e 1 +E 2 +8(H(K)+3)} ^) 

Let C denote the union of all stegotext codebooks, namely, the set of all distinct stegotext 
vectors across all codebooks corresponding to all k n 6 T^, and let N(y n ) denote the number 
of occurrences of a given vector y n £ y n in all stegotext codebooks. Then, in view of the 
above combinatorial consideration, we have 

N Y = £ N(y n ) > \C\ ■ (n + 1)-^ ■ 2 M(i-«5)^l^)-<5]. ( 68 ) 
y n ec 

Combining eqs. (|67j) and (|68jl . we have 

log |C| < n[\Ru(D') + I(X;Y,V\K) + I(K;Y) + 5'}, (69) 

where 

5> = ei + e 2 + 5{H{K) + H(K\Y) + 4) + |/C| • |^| • l0S( " + 1) , (70) 

n 

which is arbitrarily small provided that 5 is sufficiently small and n is sufficiently large. 
Thus, the rate required for public compression of Y n (without the key), which is (log \C\)/n, 
is arbitrarily close to [XRu(Di) + I(X; Y, V\K) + I(K; Y)], which in turn is upper bounded 
by R c , by condition (d) of Theorem 4. 

Before we proceed to evaluate the equivocation levels, an important comment is in 
order in the context of public compression (and a similar comment will apply to private 
compression): Note that a straightforward (and not necessary optimal) method for public 
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compression of Y n is simply according to its index within Ty , which requires about nH(Y) 
bits. On the other hand, the converse theorem tells us that the compressed representation 
of Y n cannot be much shorter than n[\Ru(D') + I(X;Y,V\K) + I(K;Y)} bits (cf. the 
necessity of condition (d) of Theorem 4). Thus, contradiction between these two facts is 
avoided only if 

XRu(D') + I(X; Y, V\K) + I(K; Y) < H(Y), (71) 

or, equivalently, 

XRu(D') + I(X;Y,V\K) <H(Y\K). (72) 

This means that any achievable point (D, D' , R c , R' c , h, h') corresponds to a choice of random 
variables (K,X,Y, V) that must inherently satisfy eq. (|72j) . This observation will now help 
us also in estimating the equivocation levels. 

Consider first the equivocation w.r.t. the reproduction, for which we have the following 
chain of inequalities: 

Nti < nH{K\Y) (73) 

= nH(K) — nI(K; Y) 

= H{K n )-nI(K-Y) (74) 

= H(K n \Y n , Z n ) + I{K n ; Y n , Z n ) - nI(K; Y) 

= H(K n \Y n ,Z n )+I(K n ;Y n )-nI(K;Y) (75) 

= H(K n \Y n , Z n ) + H(Y n ) - H(Y n \K n ) - nI(K; Y) 

< HiK^^Z^ + niXRuiD^ + IiX-Y.V^ + IiK-^ + e] - 
-n[XRu(D' + e) + I(X; Y, V\K) - e] - nI(K; Y) (76) 

= H(K n \Y n , Z n ) + nX[Rjj{D') - R V {D' + e)] + ne 

= H(K n \Y n ,Z n ) + ne' 

= I(K n ; U N \Y n , Z n ) + H(K n \Y n , Z n , U N ) + ne' 

< H{U N \Y n ,Z n )+H(K n \Y n ,Z n ,U N )+ne' (77) 

where (|73|) is based on condition (b), (|74|) is due to the memory lessness of K n , ()75|) follows 
from the fact that K n — > Y n — > Z n is a Markov chain, (|76f) is due to the sufficiency of 
condition (d) (that we have just proved) and the necessity of condition (e), and e' vanishes 
as e — > due to the continuity of -Rf/(-). Comparing the left -most side and the right -most 
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side of the above chain of inequalities, we see that to prove that H(U N \Y n , Z n ) is essentially 
at least as large as Nh', it remains to show that H(K n \Y n , Z n , U ) is small, say, 

H(K n \Y n ,Z n ,U N ) <ne' (78) 

for large n. We next focus then on the proof of eq. (|78|) . 
First, consider the following chain of inequalities: 

H(K n \Y n ,Z n ,U N ) < H(K n ,S J {K n )\Y n ,Z n ,U N ) 

= H{S J (K n )\Y n ,Z n ,U N )+H(K n \S J (K n ),Y n ,Z n ,U N ) 

< H{S J (K n )\Y n ,U N ,W L ) + H{K n \S J (K n ),Y n ), (79) 

where the second inequality follows from the fact that W L is function of U N and the fact 
that conditioning reduces entropy. As for the second term of the right-most side, we have 
by Fano's inequality 

H(K n \S J (K n ), Y n ) < 1 + Perr • nlog \K\ < ne'/2 for large enough n, (80) 

as Perr —* is the probability of error associated with the Slepian-Wolf decoder that 
estimates K n from its compressed version, S J (K n ), and the "side information," Y n . As for 
the first term of the right-most side of (|79|) . we have 

H(S J (K n )\Y n ,U N ,W L ) = H(W L @W L \Y n ,U N ,W L ) 

< H{W L \Y n ). (81) 

It remains to show that H(W L \Y n ) < ne'/2 as well. In order to show this, we have to 
demonstrate that for a good code, once Y n is given, there is very little uncertainty with 
regard to W L , which is the index of the bin. 

To this end, let us suppose that the inequality in (|72|) is strict (otherwise, we can 
slightly increase the allowable distortion level D' and thus reduce Ru(D')). As we prove in 
the Appendix, for any given (arbitrarily small) 7 > 0, 

Pr{3 y n in the code of k n that appears in more than 2^ bins} < \y\^2-^- lo ^ e ^ n \ (82) 

that is, a double-exponential decay. The probability of the union of these events across all 
representatives {k n } of all Tq C Tj^ will just be multiplied by the number of {Tq } in 
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Tj(, which is polynomial, and hence will continue to decay double— exponentially. Let us 
define then the event 

{3 y n in the stego-codebook of some k n that appears in more than 2™ 7 bins} 

as yet another error event (like the error events 1-5) that occurs with very small probability. 
Assume then, that the randomly selected codebook is "good" in the sense that no stegovector 
appears in more than 2™ 7 bins, for any of the representatives {k n }. Now, given y n , how many 
candidate bins (corresponding to encrypted messages {w L }) can be expected at most? For 
a given y n , let us confine attention to the ^-conditional type class T^ Y (y n ) (key sequences 
outside this set cannot have y n in their codebooks, as they are not jointly <5-typical with y n ). 
The conditional <5-type class T^ Y (y n ) can be partitioned into conditional 0-type classes 
{TQ K ^ Y (y n )}, where Qk\y exhausts the allowed <5-tolerance in the conditional distribution 
around Pk\Yi in the same spirit as before. Now, take an arbitrary representative k n from 
a given Tg^ |y (y n ), and consider the set of all permutations that lead from k to all other 
members {k n } of Tq (y n )- Obviously, the stego-codebooks of all those {k n } have exactly 
the same configuration of occurrences of y n as that of k (since these permutations leave 
y n unaltered), therefore they belong to exactly the same bins as in the codebook of k , the 
number of which is at most 2 7n , by the hypothesis that we are using a good code. In other 
words, as k n scans T^ KY (y n ), there will be no new bins that contain y n relative to those 
that are already in the codebook of k n . New bins that contain y n can be seen then only 
by scanning the other conditional 0-types \Tq K ^ Y {y n )} within T^|y(y n ), but the number 
such conditional 0-types does not exceed the total number of conditional 0-types, which is 
upper bounded, in turn, by (n + l^H^I 4 . Thus, the totality of stego-codebooks, for all 
relevant {k n } cannot give more than (n + 1)1^11^1 • 2 n7 distinct bins altogether. In other 
words, for a good codebook: 

H(W L \Y n ) < log[(n + 1)1^-^1 . 2 n7 ] = n 

which is less than ne'/2 for an appropriate choice of 7 and for large enough n. 



i+m-\y\ 



log(n + 1) 



(83) 
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Finally, for the equivocation w.r.t. the original message source, we have the following: 

H(U N \Y n ,Z n ) = H{U N \Y n ,Z n ) + H(U N \Y n ,Z n )-H{U N \Y n ,Z n ) 

> nH(K\Y)-2ne' + H{U N \Y n ,Z n )-H(U N \Y n ,Z n ) 
= nH(K\Y) + H(U N ) - I{U N ; U N ) - I{U N ; Y n , Z n ) - 

H(U N \U N ) + I(U N ; Y n , Z n ) - 2nd 

> nH(K\Y) + H(U N ) - H(U N ) - I(U N ;Y n , Z n ) - 
H{U N \U N ) + I{U N ; Y n , Z n ) - 2ne' 

> nH(K\Y)+NH(U)-NR v (D')-2e'}- 
[I(U N ;Y n ,Z n ) + H(U N \U N ) - I(U N ;Y n ,Z n )}, (84) 

where first inequality is due to the fact that H(U N \Y n , Z n ) > n[H(K\Y) - 2e'\, that we 
have just shown, and the third is due to the memory lessness of {Ui} and the fact that the 
rate-distortion codebook size is 

2 NRu(D>) and H 0N} < jsiRu^D'). Now, the second 
bracketed expression on the right -most side is the same as in eq. (|3.3[) . where in the case 
of this specific scheme, both inequalities in (|33|) become equalities, i.e., this expression 
vanishes. This is because in our scheme, U N — > U N — > (Y n , Z n ) is a Markov chain (and so, 
the first inequality of is tight) and because H(U N \U N , Y n , Z n ) < H(U N \U N ) = (as 
U N is a deterministic function of U ), which makes the second inequality of l)33|) tight. As 
a result, we have 

H{U N \Y n ,Z n ) > N[H(K\Y)/X + H(U)-R u (D')-2e'/X\ 

> N[h + Ru(D') - H(U) + H(U) - Ru(D') - 2e/A] 

= N(h-2e/X), (85) 

where we have used condition (a). This completes the proof of the direct part. 
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Appendix 



Proof of eq. 182\) . The probability of obtaining y n in a single random selection within the 
codebook of k is given by 

Pv{Y n (j',V n (m,j,k n ),k n )=y n } = V Z 777, — TTTi 



< 



K\ K {k n )\ \T$ lKV (k",v") 

2n(l+S)H(V\K,Y) I 



2 n[{l-5)H(V\K)-5] 2 ^S)H(Y\K,V)S] 

_ 2 -n[//(y|x)-5"] ) (a.2) 

where the first factor in the right-hand side of (|A.1|) is the probability of having a V n (m, j, k ) 
v n that is typical with y n and k (a necessary condition for this v n to generate the given 
y n ), the second factor is the probability of selecting a given y n in the random selection of 
the steogtext code, and where 

6" = 5[H(V\K, Y) + H{V\K) + H(Y\K, V) + 2]. (A.3) 

It now follows that the probability g for at least one occurrence of y n among the stegowords 
corresponding to a certain bin, in the codebook of k , is upper bounded (using the union 
bound) by 

q < M 2 -M 3 -2- n ^ Y \ K ^ 

_ 2- n l H ( Y \ K )- I ( x 'Y\ K )- I ( x ; Y \v, K )-$"- 2S -£i-£2] 

2-n[H(Y\K)-I(X-y,Y\K)-8"-28-e 1 -e 2 } 
A 2 -n[H(Y\K)-I(X;Y,V\K)-5 1 ]_ 

We are interested to upper bound the probability that a given y n appears as a stegoword 
in more than 2™ 7 bins in the codebook of k n , for a given 7 > 0. For i = 1, . . . ,Mjj, let 
Ai € {0, 1} be the indicator function of the event 

{y n appears as a stegoword in bin no. i at least once}. 

Then, clearly {Ai} are i.i.d. with Pr{A, = 1} = q. Therefore, 



Pr|^^>2 n7 | < expaj-Mc/D 



^{-MuD^^W^q)}, (A.5) 
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where for a, (3 € [0, 1], the function D(a\\f3) designates the binary divergence 

D(a\\P) = a log - + (1 - a) log (A.6) 
Now, referring to eq. (|72|l . suppose that 

#(Y|i<0 > XRu(D') + /(X; V, y \K) + 5 1 + 2 7 . (A.7) 

Then, clearly, 

2 -n[\Ru(D>)-7] > 2 -n[i?(r|if)-/(X;y,V|K)-,5i] > / A _g) 

and so, Pr{£)Jfjj; A< > 2™ 7 } is further upper bounded by 
Pr [l> > 2 n ^| < exp 2 j-M^D ( 2 -n[A^(^)-7]|j 2 -n^(y|^)-/(X;yy|^)- 5l ^ j (A Q) 

To further bound this expression from above, we have to get a lower bound to an expression 
of the form D(e~ na \\e~ nb ) for < a < b. Applying the inequality log(l + x) = — log(l — 
ifj) > for x > -1, we have: 

o— na i 9— na 

D{2~™ ||2- 6 ) = 2— log _ + (l-2-«* ) log 

/ 2 — n ^ 2 — na 

x 1 - 2~ nb 

—nb r)—na\ 



= n{b - a)2- na + (1 - 2~ na ) log M + 
> n(b - a)2~ na + (2~ nb -2~ na ) log e 



> [n(6 - a) - log e]2" na . (A.10) 

Applying this inequality with a = XRu(D') - 7 and b = H(Y\K) - I(X; Y, V\K) - 5i, we 
get 

and so, 

Pr < Ai > 2™ 7 \ < 2-(«T- lo s e ) 2 " 7 , (A. 12) 

which decays double-exponentially rapidly with n. While, this inequality holds for a given 
y n , the probability that J2i=i > 2™ 7 for some y n G y n would be upper bounded, using 
the union bound, by \y\ n • 2~(" 7 ~ lose ) 2n7 , which still decays double-exponentially. Thus, 
with very high probability the random selection of stegovectors, for k n , is such that no stego 
codevector y n appears in more than 2" 7 bins. 
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Figure 1: A generic watermarking/encryption system. 
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Figure 2: The proposed watermarking/encryption scheme (general case). 
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